Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks
نویسندگان
چکیده
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.
منابع مشابه
Entropy Inference and the James-Stein Estimator
Entropy is a fundamental quantity in statistics and machine learning. In this note, we present a novel procedure for statistical learning of entropy from high-dimensional small-sample data. Specifically, we introduce a a simple yet very powerful small-sample estimator of the Shannon entropy based on James-Stein-type shrinkage. This results in an estimator that is highly efficient statistically ...
متن کاملComparison of Small Area Estimation Methods for Estimating Unemployment Rate
Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta­ tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula­ tion census is a challenge for SCI in using these methods. In general, the...
متن کاملEstimation of the Multivariate Normal Mean under the Extended Reflected Normal Loss Function
متن کامل
Estimation of the proteomic cancer co-expression sub networks by using association estimators
In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networ...
متن کاملEvaluation of the Efficiency of the Adaptive Neuro Fuzzy Inference System (ANFIS) in the Modeling of the Ionosphere Total Electron Content Time Series Case Study: Tehran Permanent GPS Station
Global positioning system (GPS) measurements provide accurate and continuous 3-dimensional position, velocity and time data anywhere on or above the surface of the earth, anytime, and in all weather conditions. However, the predominant ranging error source for GPS signals is an ionospheric error. The ionosphere is the region of the atmosphere from about 60 km to more than 1500 km above the eart...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 10 شماره
صفحات -
تاریخ انتشار 2009